In practice, it is easier to construct hashing algorithms which operate on relatively small, fixed input lengths, whilst still keeping the output length even smaller (
In particular, given a compression function
The construction is similar to a block cipher in the sense that the message
This means that the message
When the message length
The number of bits reserved for encoding the message length $l_{mb}$ is fixed for a given Merkle-Damgård construction. Usually $l_e$ is 64 bits, resulting in a maximum message length of $2^{64} - 1$, which is quite a reasonable limit.
After padding, the actual hash algorithm begins by appending an initialisation vector (IV) of length
The SHA256 hash function uses the following 256-bit IV (the value is in hex):
$$IV := \texttt{0x6A09E667BB67AE853C6EF372A54FF53A510E527F9B05688C1F83D9AB5BE0CD19}$$
This initialisation vector serves as the initial chaining variable
The reason why the Merkle-Damgård transform is used ubiquitously is the fact that it preserves collision resistance.
If the compression function $H'$ is collision resistant, then so is the Merkle-Damgård function $H$.
Suppose, towards contradiction that there is an efficient collision finder $\mathcal{A}$ which can find a collision in $H$ with non-negligible probability. Let $x$ and $x'$ be two inputs of lengths $L$ and $L'$, respectively, such that $H(x) = H(x')$. Let $x_1, x_2, ..., x_q$ be the $q$ blocks which $x$ is divided into, and, let $x_1', x_2', ..., x_{q'}'$ be the $q'$ blocks which $x'$ is divided into. Similarly, let $t_1, t_2, ..., t_q, t_{q+1}$ and $t_1', t_2', ..., t_{q'}', t_{q'+1}'$ be the chaining variables used at each iteration of the hashing of $x$ and $x'$, respectively (remember that the chaining variables $t_{q+1}$ and $t_{q'+1}'$ are also the output of $H$).
**Case 1:** If the two inputs have different lengths, i.e. $L \ne L'$, then the hash $H(x)$ is $t_{q+1} \coloneqq H'(x_q||t_{q})$ and the hash $H(x')$ is $t_{q'+1}' \coloneqq H'(x_q'||t_{q'}')$. However, $H(x) = H(x')$ means that $H'(x_q||t_{q}) = H'(x_q'||t_{q'}')$ which is a contradiction because $L\ne L'$ and so $x_q \ne x_{q'}'$ (remember that the length is appended to the message when padding) - we have found two different inputs which cause a collision in the collision resistant $H'$.
**Case 2:** If the two inputs have the same length, i.e. $L = L'$, then they are also divided into the same number of blocks $q = q'$. Let $I_i \coloneqq x_i||t_i$ denote the $i$-th input passed to $H'$ when computing $H(x)$, and let $I_i' \coloneqq x_i'||t_i'$ denote the $i$-th input passed to $H'$ when computing $H(x')$. Additionally, we will denote the output of $H(x)$ as $I_{q+1} \coloneqq t_{q+1}$, and we will denote the output of $H(x')$ as $I_{q'+1}' \coloneqq t_{q'+1}'$.
Now, $H(x) = H(x')$ and so $I_{q+1} = t_{q+1} = I_{q'+1}' = t_{q'+1}'$. This can only happen if $I_q = I_{q'}'$ or if $(I_q, I_{q'}')$ is a collision pair for $H'$ and the same logic propagates backwards - in general, $H'(I_i) = H'(I_i')$ can be true only if $I_i = I_i'$ or if $(I_i, I_{i'}')$ is a collision pair for $H'$. The inputs $x, x'$ are a collision pair for $H$ which means that $x \ne x'$ and so there *must* be some index $j$ for which $x_j \ne x_j'$ which means for sure that $I_j \ne I_j'$ and so $(I_j, I_j')$ turn out to be a collision in $H'$, which is a contradiction.